Parallel sparse LU factorization on different message passing platforms

نویسنده

Kai Shen

چکیده

Several message passing-based parallel solvers have been developed for general (nonsymmetric) sparse LU factorization with partial pivoting. Existing solvers were mostly deployed and evaluated on parallel computing platforms with high message passing performance (e.g., 1–10 μs in message latency and 100–1000 Mbytes/sec in message throughput) while little attention has been paid on slower platforms. This paper investigates techniques that are specifically beneficial for LU factorization on platforms with slow message passing. In the context of the S+ distributed memory solver, we find that significant reduction in the application message passing overhead can be attained at the cost of extra computation and slightly weakened numerical stability. In particular, we propose batch pivoting to make pivot selections in groups through speculative factorization, and thus substantially decrease the interprocessor synchronization granularity. We experimented on three different message passing platforms with different communication speeds. While the proposed techniques provide no performance benefit and even slightly weaken numerical stability on an IBM Regatta multiprocessor with fast message passing, they improve the performance of our test matrices by 15–460% on an Ethernet-connected 16-node PC cluster. Given the different tradeoffs of communication-reduction techniques on different message passing platforms, we also propose a sampling-based runtime application adaptation approach that automatically determines whether these techniques should be employed for a given platform and input matrix.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of Sparse LU Factorization and Triangular Solution on Multicore Platforms

The Chip Multiprocessor (CMP) will be the basic building block for computer systems ranging from laptops to supercomputers. New software developments at all levels are needed to fully utilize these systems. In this work, we evaluate performance of different highperformance sparse LU factorization and triangular solution algorithms on several representative multicore machines. We include both pt...

متن کامل

Sparse LU factorization on the CRAY T3D

The paper describes a parallel algorithm for the LU fac-torization of sparse matrices on distributed memory machines by using SPMD as programming model and PVM as message passing interface. We address all the diiculties arising in sparse codes, as the ll-in or the dynamic movement of data inside the matrix. The cyclic distribution has been used to evenly distribute the elements onto a mesh of p...

متن کامل

On the Impact of Communication Latencies on Distributed Sparse Lu Factorization

Sparse LU factorization ooers some potential for parallelism, but at a level of very ne granularity. However, most current distributed memory MIMD architectures have too high communication latencies for exploiting all parallelism available. To cope with this, latencies must be avoided by coarsening the granularity and by message fusion. However, both techniques limit the concurrency, thereby re...

متن کامل

Experiments with Cholesky Factorization on Clusters of SMPs

Cholesky factorization of large dense matrices is an integral part of many applications in science and engineering. In this paper we report on experiments with different parallel versions of Cholesky factorization on modern high-performance computing architectures. For the parallelization of Cholesky factorization we utilized various standard linear algebra software packages and present perform...

متن کامل

Comparative Analysis of High Performance Solvers for 3D Elliptic Problems

The presented comparative analysis concerns two iterative solvers for 3D linear boundary value problems of elliptic type. After applying the Finite Difference Method (FDM) or the Finite Element Method (FEM) discretization a system of linear algebraic equations has to be solved, where the stiffness matrix is large, sparse and symmetric positive definite. It is well known that the preconditioned ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

J. Parallel Distrib. Comput.

دوره 66 شماره

صفحات -

تاریخ انتشار 2006

Parallel sparse LU factorization on different message passing platforms

نویسنده

چکیده

منابع مشابه

Evaluation of Sparse LU Factorization and Triangular Solution on Multicore Platforms

Sparse LU factorization on the CRAY T3D

On the Impact of Communication Latencies on Distributed Sparse Lu Factorization

Experiments with Cholesky Factorization on Clusters of SMPs

Comparative Analysis of High Performance Solvers for 3D Elliptic Problems

عنوان ژورنال:

اشتراک گذاری